reference price
No-Regret Learning in Dynamic Competition with Reference Effects Under Logit Demand
This work is dedicated to the algorithm design in a competitive framework, with the primary goal of learning a stable equilibrium. We consider the dynamic price competition between two firms operating within an opaque marketplace, where each firm lacks information about its competitor. The demand follows the multinomial logit (MNL) choice model, which depends on the consumers' observed price and their reference price, and consecutive periods in the repeated games are connected by reference price updates. We use the notion of stationary Nash equilibrium (SNE), defined as the fixed point of the equilibrium pricing policy for the single-period game, to simultaneously capture the long-run market equilibrium and stability. We propose the online projected gradient ascent algorithm (OPGA), where the firms adjust prices using the first-order derivatives of their log-revenues that can be obtained from the market feedback mechanism. Despite the absence of typical properties required for the convergence of online games, such as strong monotonicity and variational stability, we demonstrate that under diminishing step-sizes, the price and reference price paths generated by OPGA converge to the unique SNE, thereby achieving the no-regret learning and a stable market. Moreover, with appropriate step-sizes, we prove that this convergence exhibits a rate of O(1/t).
Appendices for No-regret Learning in Price Competitions under Consumer Reference Effects A Expanded Literature Review
There are also very recent works that address the dynamic pricing problem with consumer reference effects under uncertain demand. Nevertheless, these two lines of works are oblivious to consumer reference effects. In contrast to these two papers, our work studies price competitions over an infinite time horizon where reference prices adjust over time, and provides theoretical guarantees for the convergence of pricing strategies under the partial information setting. In their setting, the subgradient for each bidder's objective is a function of all bidders' decisions as well as its budget rate (i.e. total fixed budget divided by a given time horizon), which can be B.1 Proof of Theorem 3.1 (i) By first order conditions, we know that arg max We now follow a similar proof to that of Tarski's fixed point theorem: consider the set Note that convergence is monotonic because U () is nondecreasing. This implies that under Assumption 1, the interior SNE is unique.
We first thank all reviewers for their thoughtful comments, and we wish everyone health during these hard times
We first thank all reviewers for their thoughtful comments, and we wish everyone health during these hard times. We acknowledge the simplicity in our linear demand and reference price update models. These references are also discussed in Section 2 of the paper. The gradient of revenue can be calculated using estimated elasticity, observed sales (i.e. Assumption 1 is invoked in all theorems and lemmas of Section 5, and we will clearly state this in the revised paper. In the proof of Lemma 3.2, we show that This means if firms are willing to consider both prices near zero and those sufficiently large, Assumption 1 holds.
Review for NeurIPS paper: No-regret Learning in Price Competitions under Consumer Reference Effects
Summary and Contributions: This paper studies a multi-period model: in each period, each of two firms posts a price for each product. The consumers demand for each of the products is linear in both prices and in addition linear in the reference price which captures past prices. Specifically, it is a weighted average of the previous-period reference price and the two previous-period posted prices. The firms do not know the specific demand function and have access only to the derivative of their revenue (a function that maps a price to revenue which is equal to demand times price) which is denoted by g_i(p_i). Note that g_i() depends on the other parameters but the firm views them as constants and is able to feed g_i with a possible choice of p_i and then get the revenue derivative for that price choice.